Catalogue search • Linguistik portal • Fachinformationsdienst (FID)

1	MassiveSumm: a very large-scale, very multilingual, news summarisation dataset ...
	The 2021 Conference on Empirical Methods in Natural Language Processing 2021; Schluter, Natalie; Varab, Daniel. - : Underline Science Inc., 2021
	Abstract: Anthology paper link: https://aclanthology.org/2021.emnlp-main.797/ Abstract: Current research in automatic summarisation is unapologetically anglo-centred - a persistent state-of-affairs, which also predates neural net approaches. High-quality automatic summarisation datasets are notoriously expensive to create, posing a challenge for any language. However, with digitalisation, archiving, and social media advertising of newswire articles, recent work has shown how, with careful methodology application, large-scale datasets can now be simply gathered instead of written. In this paper, we present a large-scale multi-lingual summarisation dataset containing articles in 92 languages, spread across 28.8 million articles, in more than 35 writing scripts. This is both the largest, most inclusive, exist- ing automatic summarisation dataset, as well as one of the largest, most inclusive, ever published datasets for any NLP task. We present the first investigation on the efficacy of resource building from news ...
	URL: https://dx.doi.org/10.48448/8thm-zg55 https://underline.io/lecture/37700-massivesumm-a-very-large-scale,-very-multilingual,-news-summarisation-dataset
	BASE
	Hide details

2	The Danish Gigaword Project ...
	Strømberg-Derczynski, Leon; Ciosici, Manuel R.; Baglini, Rebekah. - : arXiv, 2020
	BASE
	Show details

Search in the Catalogues and Directories